Search CORE

3 research outputs found

XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning

Author: Hryniewicki Maciej K.
Zhao Yue
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/11/2019
Field of study

A new semi-supervised ensemble algorithm called XGBOD (Extreme Gradient Boosting Outlier Detection) is proposed, described and demonstrated for the enhanced detection of outliers from normal observations in various practical datasets. The proposed framework combines the strengths of both supervised and unsupervised machine learning methods by creating a hybrid approach that exploits each of their individual performance capabilities in outlier detection. XGBOD uses multiple unsupervised outlier mining algorithms to extract useful representations from the underlying data that augment the predictive capabilities of an embedded supervised classifier on an improved feature space. The novel approach is shown to provide superior performance in comparison to competing individual detectors, the full ensemble and two existing representation learning based algorithms across seven outlier datasets.Comment: Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN

arXiv.org e-Print Archive

Crossref

LSCP: Locally Selective Combination in Parallel Outlier Ensembles

Author: Hryniewicki Maciej K.
Li Zheng
Nasrullah Zain
Zhao Yue
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 22/01/2019
Field of study

In unsupervised outlier ensembles, the absence of ground truth makes the combination of base outlier detectors a challenging task. Specifically, existing parallel outlier ensembles lack a reliable way of selecting competent base detectors, affecting accuracy and stability, during model combination. In this paper, we propose a framework---called Locally Selective Combination in Parallel Outlier Ensembles (LSCP)---which addresses the issue by defining a local region around a test instance using the consensus of its nearest neighbors in randomly selected feature subspaces. The top-performing base detectors in this local region are selected and combined as the model's final output. Four variants of the LSCP framework are compared with seven widely used parallel frameworks. Experimental results demonstrate that one of these variants, LSCP_AOM, consistently outperforms baselines on the majority of twenty real-world datasets.Comment: Proceedings of the 2019 SIAM International Conference on Data Mining (SDM

arXiv.org e-Print Archive

Crossref